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This memorandum presents the implementation choices for the 
proposed new Mul tics Storage System described in HT8-055. 

The reader is assumed to be generally familiar with the 
operation of the current Hultics Storage System. 



Five problems with the current Storage System were 
identified in MTB-017* and five goals proposed. These were? 

1. PROBLEM: The Storage System loses information. 

GOALJ Eliminate loss of information by reducing the number 
of crashes, by limiting the damage done by crashes, ana by 
minimizing loss of information during recovery procedures. 

Zm PROBLEM* Backup and recovery procedures cost too much. 

GOAL * Minimize system down time and devote fewer resources 
to backup functions. 

3. PROBLEMS Large amounts of storage cannot be handled. 

GOAL: Make extremely large storage configurations usable 
without imposing a penalty in performance, reliability, or 
availability. 

hm PROBLEM* Several desirable features should be added, 
including support for removable disk packs. 

GOAL* Add support for removable disk packs and other 
features. 

5. PROBLEMS The operator interface is deficient. 

GOALS Improve the operator interface, especially in the 
areas of shrinking and expanding the device complement, 
operating a crippled system, and providing recovery 
information. 



Mul tics Project internal working documentation. Not to be 
reproduced or distributed outside the Mul tics Project, 
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&X£££i£M SLL IhS. Proposal 

The physical storage available on a Hultics configuration 
will be grouped into partitions* as it is now. The HULT 
partition will be further subdivided into ioaica I volumes* which 
may consist of part of a physical volume, or severaJ physical 
volumes. All physical volumes which comprise a logical volume 
must be mounted or dismounted at the same time* so that a logical 
volume may not be partially mounted? but logical volumes can be 
added to or removed from the HULT partition while the system is 
running* A physical volume may contain storage for only one 
logical volume? the reason for allowing "fractional" physical 
volumes is to accomodate the DUMP* LOG* SALV and 80S partitions 
without requiring that the minimum system configuration have two 
volumes* 



IPACK 1 1 PACK 2 


I PACK 3 


IPACK > 


IPACK 5 IPACK 6 1 


1B0S* SALVI 








1 Root 1 Root 


S Public 


1 Public 


I Public 1 Private 1 


F i gur e 1 s 


Example of 


Physica 1 


Volume Usage 



Every segment in the new hierarchy will , have all its pages 
a 1 1 oca ted ojn the s a me physica J v o 1 ume_,„_JTl I se gment s i n the same 
directory will be contained in the same logical volume. 

Each physical volume has a label, recorded by a special BOS 
utility* which describes the storage extents on the volume and 
the name of the logical volume it provides storage for. A 
physical volume is part of only one logical volume. Each 
physical volume has a volume unique identifer ( VOLUIO) , used by 
the system to identify the volume. 

Each physical volume in the new Storage System has a Volume 
Table of Contents (VTOC) which contains an entry for every 
segment on the volume* The VTOC entries contain the information* 
formerly present in the directory branch* which describes the 
physical storage occupied by the segment* All pages of a segment 
wi! I reside on the same volume* Each volume also contains a 
Volume Hap* which has an entry for each page on the volume 
describing its current status* 

There is no FSOCT in the new Storage System. The Volume 
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Haps and the configuration deck provide the information which 
used to be contained in the FSDCT. Volumes listed in the 
configuration deck are called permanent volumes* and cannot be 
discounted. AH directories must reside on one permanent logical 
volume designated in the configuration deck; no directory may 
ever be off Sine. The logical volume which contains the roof may 
consist of several physical volumes; and a configuration need 
have only this one logical volume defined. 

The directory branch and the VTOC entry for a segment are 
connected by a yTQg pointer stored in the branch. This pointer 
is a pair of 36-bit quantities which specify the VOLUID and the 
location within the VTOC where the VTOC entry resides. The VTOC 
pointer's second component, the VTQC index , is only interpreted 
within the context of the specif iec volume. Both the branch and 
the VTOC entry contain the unique 10 of the segment, and both 
unioue I0*s must match if the system is to consider the 
association valid. 

The system will maintain a table in wired-oown storage known 
as the Device Table, which has one entry per disk drive in the 
configuration, specifying the VOLUID for the volume mounted on 
the drive, the DIM parameters necessary to run the drive, and 
other data. 

The system will also have a more extensive ring i data base 
which registers each logical volume known at the installation, 
and lists the physical volumes involved, the volume owner, and 
provides an access control list for mounting control. 
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The focus during the initial design of the new Storage 
System will be on getting a version which runs and is 
functionally correct* in as short a time as possible. Adequate 
system performance is alsc an important functional requirement* 
8ut some functional extensions will be postponed to later phases 
of system development In order to get the new system on the air 
qui ckl y. 



&S-L3 £££££ 

Wherever possible* data bases will be modified in a 
compatible fashion* leaving previously-defined items where they 
were* For example* the directory branch need not change size? 
the removal of file maps will be compensated for by the addition 
of a VTOC pointer* but no attempt will be made to (say) 
re- structure ACL"s. 



SUP eryjspr £&li£ 

It is hoped that ail current supervisor calls will continue 
to function exactly as they do now* with one or two exceptions. 
A few new entries will be addec* and one or two new status codes 
may be possible I for example* "logical volume not on-line"). 



ft I g p r i thrns 

Straightforward code will be much easier to debug and 
maintain* so our preference will be to implement the new Storage 
System with less mechanism rather than more* This is especially 
true for the first phase of the implementation. 

For instance* the current system has a fairly complicated 
mechanism for allocating variable-sized file maps in the 
directory. When file maps are moved to the VTOC. this strategy 
will be eliminated, and each VTOC entry will contain space for 
the maximum-size file map* 



5££jj£il* Considerations 

The additional security controls described in MT8-086 will 
be supported by the new Storage System. Care will be taken to 
Insure that no new ways of communicating between users of 
different access authorizations are introcuced. 

In order to prevent unauthorized communication between users 
with different attributes, by means of quota manipulation or 
signalling by mounting and dismounting* logical volumes which can 
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be dismounted may contain segments from only one sensitivity 
level and category set. The level* category sett and minimum 
ring number for each dlsmountab le volume is kept in the on-line 
logical volume registration oata, and is also recordecs in the 
volume labels. 

To guard against accidental disclosure of information in the 
event of a system failure, the new Storage System takes 
considerable care to avoid re-used addresses. Also, all free 
pages on disk are explicitly required to be zero, so that even if 
an unused page is mistakenly added to a segment as a result of a 
system crash or a dropped bit, the system will not compromise 
security. 



Sixes £i £ ieftis 

One problem the new Storage System solves is that of 
providing for much more physical storage in a Multics 
configuration than the current supervisor can handle. Part of 
the current problem arises because we wish to support future 
hardware enhancements which may provide storage devices with rauch 
larger capacity. The recent change to the whole Storage System 
and to BOS needed to support DSU-19l*s was able to find a free 
bit J the next such change would require restructuring of irany 
system data bases. 

For this reason the disk record address is being changed in 
format. The current address is 

device ID bit (*t) 

device address bit (j.8) 

where the device ID, ranging from i to 7 (0 is used for null 
addresses, and the high-order bit indicates that the page is on a 
special device, e.g. the paging device), specifies the storage 
subsystem (OSU-191, etc.) according to a table in the FSOCT. 

The new address format expands from 21 bits to 36. It looks 
J ike this: 

device table index bit <i8) 
record address bit (18) 

The old "device address" coded both disk drive number and address 
on the disk pack into 18 bits? this has been changed so that we 
have an effective width of 36 bits* with the device table index 
usee to select the proper disk drive and the record address being 
strictly an offset within the volume. The "device 10" is located 
in the device table, and selects the strategy and coding scheme 
used to run the device. The new address format provides for up 
to 256K devices on-line, each oevice having a capacity of 256K 
records. Since the current capacity of a OSU-191 pack is about 
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20*000 records* we are prepared to support a tenfold increase in 
the capacity of a single pack? if devices with more capacity are 
produced* we can define several logical volumes on one physical 
volume. The total amount of storage which can be supported by 
the systeit increases by a factor of 32K* to about 281 quadrillion 
characters* 

The disk record address is never interpreted except in the 
context of its own volume. Different volume-addressing schemes 
and VTOC layouts, could exist compatibly within the same 
configuration on different vol uroes* 

The "VTOC index" stored in a branch is used at segment 
activation time to locate the correct VTOC entry on the volume. 
Like the disk record address* this number can be interpreted only 
in the context of the volume it refers to* The VTOC index might 
be coded as a record address on the volume plus an offset, or as 
a subscript in a fixed-length array* or as some sort of hash 
address into the VTOC* Haking this field 36 bits wide insures 
that whatever clever coding scheme is used will have enough bits 
available* 



Command Changes 

The list and status commands should have options to list the 
logical volume on which a segment resides* and to indicate 
whether a segment is on-line. Some redefinition of the items 
printed by the default invocation of the list command would be a 
good idea* to insure that the command will reference only the 
directory unless the user explicitly requests otherwise. 

An active function M on_l ine M would be useful for checking 
whether a segment is currently on-line* 

A new command *'set_vol" ana an option to status to return 
the logical volume ID will be needed to handle the volume ID 
associated with each directory. 

New commands are needed to request the mounting and 
dismounting of volumes. 



&£js ^ypervisor Entries 

A new hcs_ entry must be provided* or status., moaified, to 
return the name of the logical volume on which a segment resides. 
New calls are also necessary to set and get the logical volume ID 
associated with a directory. 
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ALGORITHMS 

This section describes the sequences of operations performed 
by the new Storage System for various system functions. Each 
function is describes in terras of its differences from the 
current Storage System function* 



£nau£H Creation 

When append is called to create a new branch, it must 
determine the correct volume for the storage associated with the 
branch and allocate a VTOC entry on that volume. The VTOC 
pointer to the new VTOC entry is then stored in the branch. 

In order to create a segment, a user must have append 
permission on the parent directory and meet the usual valiaation 
level and security level constraints. 

To Determine the volume, append obtains the logical volume 
name f roa the directory header. If more than one physical volume 
is a member of the logical volume, the physical volume with the 
most free space is chosen to receive the new segment (unless the 
volume has a switch set which makes it appear "full," as may 
happen when a logical volume is being compressed). 

Once the volume is oetermined, append locks the directory 
and al locates the branch as it does now. Next, append calls the 
VTOC_manager to request the creation of a VTOC entry on the 
appropriate volume. If the VTOC for the chosen volume is full, 
append returns an error code ana does not create the branch. 

The VTOC entry is initialized by the VTOC manager when the 
entry is allocated. Once a VTOC entry has been allocated, 
modifications to the VTOC entry are adequately protectee by the 
parent directory lock. 



Making a segment known does not require a reference to its 
VTOC entry. 



Sequent Lauii 

The system's processing of s segment fault has two parts: 
first, the supervisor determines whether the segment faulted on 
is active. If so, it is only necessary to connect the SOW for 
the segment to the page table and return* If the segment faulted 
on is not active, it must be activated. To determine whether a 
segment is active, the supervisor will obtain the unique 10 of 
the segment from the KST entry and search the AST for the 
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segment. 



To activate a segment the supervisor obtains the parent 
directory's segment number and the location of the segment's 
branch from the 1CST* locks the parent directory* and then calis 
VTOC_manager to cause the VTOC entry for the segment to be read 
into a wired buffer. Next* the system locks the AST and obtains 
an ASTE of the appropriate size* (This step may cause some other 
segment to be deactivated.) The ASTE is filled in from the VTOC 
entry and the branch. 

When the page table is being filled in* the system may 
encounter null addresses in the file map. These are represented 
as PTW"s with a •'null address** flag on, which the system will 
check at page fault time. 



Page £M1 

When a process encounters a page fault* the supervisor 
checks the PTW being faulted on to see if the "null address" flag 
is on. If not* the disk record address from the PTW, together 
with the Device Table index in the AST entry, is used to generate 
a disk address for the device I/O. 

If the supervisor encounters a fault on a PTW with the "null 
address" flag* the supervisor wil I give the segment a block of 
zeroed core. 

When a page is being written out, the supervisor will 
examine the page to see if it is ail zero. When the current 
supervisor detects this situation, it does not write the page, 
but simply frees the disk address. This behavior is thought to 
be the cause of many of the re-used address problems which the 
current system encounters. In the new Storage System, zero pages 
will be written back to disk* and a "zero page" flag set in the 
PTW. Pages with this flag on will be freed at deactivation time* 
that is* when the VTOC is updated to reflect the fact that the 
record is no longer being used by this segment. This strategy" 
Tnsures that all free records on disk are zero* so that damage to 
a disk pack is much less likely to introduce old information into 
a file. 

When a page is to be written out* if the "null address" flag 
is on* a disk record will be assigned on the appropriate volume. 
If the page is all zeroes* of course* this step can be 
e I i minated. 
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Bounds Fault 

Bounds faults will be greatly simplified in the new Storage 
System. Since all file maps are full size, there is never a need 
to re-allocate a file map in the middle of bound fault 
processing. This means that a bound fault need only re-ai locate 
AST entries. Since the ASTEP has been removed from the branch* a 
bound fault does not need to modify the branch, and therefore the 
directory need not be locked. 



Deactivation 



When a segment is deactivated, any disk records 
corresponding to PTW*s with the "zero page" flag will be 
released. The data in the ASTE are written back to the VTOC by 
VTOC_manager , which does not lock the parent directory. A system 
performance improvement can be expected for deactivation since 
the branch need not be referenced t this change eliminates the 
paging in and writing out of the directory pageCs) for the 
branch. In addition, the references to the directory header page 
for the locking and unlocking operations on the parent directory 
are el iminated. 



Mak ina a. Segment Un known 

No change is maoe to raakeunknown. 



Truncation 

When a segment is truncated the pages of the segment will 
be explicitly zeroed and the zero pages written out to disk. 



Paction 

When a segment is deleted, it is first truncated, to insure 
that the disk pages it occupies are zeroed. The seduence for 
deleting the segment is* 

I ock directory 
delete branch 

call VTOC_manager to delete VTOC entry 
unlock directory 

It »ay be possible to unlock the directory before calling 
VTOC_aana ger. 
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Directory Hotiiflcat iras 

When the system crashes while a directory is being modified* 
the salvager frequently finds the directory in an inconsistent 
state* It is possible for the salvager to do a great deal of 
damage to the hierarchy in the attempt to correct the directory* 
What seems to happen is that a directory is locked* and the 
supervisor starts Baking some change in the directory* for 
example adding a new ACL entry* and gets far enough in this 
operation so that the directory is inconsistent when the 
directory page is removed fro® core in the normal course of core 
management. If system operation is then interruptec* the page of 
the directory which is on disk must be repaired by the salvager 
before the directory can be used. Core pages which are flushed 
to disk by emergency_shutdown may also lead to this situation* 
In the current storage system* if emergency_shutdown succeeds* 
all potentially inconsistent directories can be detected by 
examination of the ASTEP in the branch and the lock in the 
directory header. The new storage system eliminates these items* 
and might appear to make the Job of the salvager more difficult* 

It would be far better from the point of view of reliability 
if the inconsistent directory pages were never written to disk* 
This might occasionally cause operations which the user thought 
had completed Just before a crash to be lost* but it would Bean 
that for almost all system crashes* no salvaging of the directory 
structure was necessary. 

To accomplish this goal* the operation of locking a 
directory will set switches honored by page control which will 
prevent any pages of a directory which is locked from being 
written out to disk. (The pages may be claimed if they have not 
been modified* and if a paging device is available the pages may 
be moved to the paging device.) These switches must be respected 
by page control and emer gen cy_shutd own. (If pages of locked 
directories may go to the paging device then the salvager roust 
respect such a switch in the paging device map too.) 



Easing Device Management 

No significant changes are planned to paging device 
management. 

Vol ume j fount ino 

When a user wishes to request the mounting cf a logical 
volume* the pattern works somewhat like that proposed for tape* 
The request is validated by ring i and passed to the system 
control process* where a message is typed to the operator. When 
the operator has mounted the volume* he issues a command to 
inform the system that the physical volume is mounteo* The 
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supervisor will translate requests for the mounting of a logical 
volume into multiple requests for the mounting of physical 
volumes, if necessary. 

Registration information, Including an access control list* 
will be maintained for each logical volume, in a ring l data 
base. This information will include the list of physical 
volumes* owner identification, and access control anc security 
information. 



Vol ume Connection 

Volumes are connected to the Storage System by the 
supervisor either at system initialization or in response to a 
user mount request call passed from ring i. After verifying that 
the drive is ready and that the volume label, VTCC, and Volume 
Hap are correct ana self-consistent, the system makes an entry in 
the Oevice Table showing that the volume is on-line. 

Before a connection is made, each VTOC entry is vaJ idated 
Cits current segment length and number of pages must agree with 
the file map), and the VTOC file maps are then checked against 
the Volume Map. If a file map address from a VTOC entry points 
to a disk record marked free in the Volume Hap* the record will 
be marked as used. If two file map addresses point to the same 
Volume Hap entry, the Volume Hap and both VTOC entries will have 
the record freed, and the record will be zeroed. 



Volume dismounting may be the result of an explicit request 
by the user who mounted a volume or the dismount reaues t may be 
issued by a privileged process. 

Ths supervisor must not allow the dismounting of a volume if 
there are still pages in core or cn the paging device which have 
not yet been written to the device. Each volume wilt have a 
switch which prevents any more activations. A program similar to 
shutdown can then set the switch and loop through the AST 
deactivating segments on a volume which is to be dismounted. 

Once a volume has had all its segments deactivated and all 
pages flushed, if is safe to dismount it. Any known segments 
which have been di smountea will cause seg_f aul terror conditions 
if they are referenced. 
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£il£ S*5,tgw Initial izatlon 

The program initial ize_dims is called to start up the 
Storage System, Its first step is to read the CONFIG deck and 
initialize the disk GIH"s for each oisk type listed on a PRPH 
care. It then reads the INTK card, determines the correct 
partition* and locates the PART care for the partition. The PART 
care lists the logical volumes which are in the partition? the 
first logical volume listed must contain the root. The logical 
vol uses are described by VOL cards which tell which physical 
units contain the physical storage for the logical volume. Each 
volume is connected to the system* starting with the root volume. 

The root volume contains a pointer in the volume label to 

the VTQC entry for the root directory. It may be possible to 

have a special segment which contains a root branch* in order to 
eliminate various pieces of complication in directory control. 



Msis Record Assiqnffeat 

When the system attempts to write out a page which has the 
•"null address" flag on* the supervisor will assign a disk record 
on the volume where the segment resides. For each volume 
connected to the system* the supervisor keeps a pool of free 
addresses in wired-down core. As record addresses are neeced » 
they are withdrawn from the pool* If a pool becomes empty* the 
supervisor replenishes It by reading in a section of the Volume 
Hap and noting the free addresses. The pool is also adaed to by 
pages released at truncation and deletion* and zero pages freed 
at deactivation. 

Since page faults cannot claim very many pages a second 
without exhausting the system*s free space* the number of times 
that the Volume Hap must be consulted should be low when the 
system is in steady state. 



Access la t he ^TOC 

The VTOC manager is a new program which will be responsible 
for all accesses to the VTOC entries. It will have wired core 
buffers of its own* and will access the VTOC by a special I/O 
facility which will use 6^-word disk reads and writes instead or 
102^-word I/O. 

When a request to read a VTOC entry is made* the VTOC 
manager will first search the AST to see if the segment is 
active* and if so will reconstruct the VTOC entry from the ASTE 
and return the VTOC entry to the caller. If the segment is net 
active* the core buffers will be checked to see if the VTOC entry 
Is recently used and still in core. If not* a disk I/O request 
will be Issued for the VTOC entry; it will be read into a free 
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core buffer if one is available* If no buffer is available* the 
oldest buffer will first be written to disk (if modified) and 
then the read performed* Each buffer will have a "modified bit" 
so that a VTOC entry need not be rewritten unless it has changed. 

The contents of the VTOC entry can be completely 
reconstructed from the AST entry* so that when a segment is 
active, we can assume that the only copy of the VTOC entry exists 
in the AST* This property allows us to write VTOC entries out to 
disk from the AST entry data without having to first read in the 
VTOC entry in order to update it* 

Although the use of 6^-word I/O in addition to the standard 
102^-word I/O used by the paging mechanism adds some code and 
some complexity to the supervisor* it provides several important 
advantages for the management of VTOC information* Obviously* 
the use of small "pages" for VTOC entries cuts down on the amount 
of disk channel busy time and memory load* for any given rate of 
access to the VTOC* The amount of wired-down storage needed to 
buffer VTOC entries in core is decreased* But the most important 
effect is to eliminate the unnecessary transportation of VTOC 
entries and file maps for segments which are not being activated. 
Our experience to date suggests that data are most often 
destroyed when they are in core* or when they are transported to 
and from core. Using 6^-word I/O makes it more likely that a 
system crash will destroy only segments which were actually in 
use at the time of the crash* 



Locking 

The locking hierarchy looks like thisl 

directory lock 
parent directory lock 
root directory lock 
AST lock 

VTCC manager lock 

page control global lock 

traffic control lock 



The VTOC manager lock and the page control global lock will be on 
the same level - that is* there Is never an attempt to lock both 
of these at once* 

Carrying Packs BctwAsn Sites 

Carrying packs between sites will be tricky. The VTOC 
entries are valid, but what must be done is to construct branches 
for each VTOC entry, and fill in VTOC pointers and valid unique 
IO - s* This operation must be privileged and done carefully. The 
VOLUID must also be changed, since different Installations may 
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have assigned the same VOLUID to different packs. 

One way to construct the new branches to describe the 
contents of a volume which has been imported into a site woulc be 
to require the user to run a program which looks much like 
backup_dump* which would write a sial i tape containing the 
directory information only for ail segments on the volume. Then 
the user would carry both a pack and a small tape reel. A 
convention could be established to permit the supervisor to place 
the contents of this tape on the pack itself. 

The ability to carry packs between sites will not be part of 
the initial implementation. 



The quota mechanism will have the sai?e basic elements as the 
current schemes that is* all segments in the same directory will 
be charged to the same "quota cell's consisting of 

maximum records used 
current records used 
time-record product 
time last updated 

Since all non-directory segments in the same directory must 
reside on the same logical volume, one quota cell per directory 
is sufficient. It will be stored in the VTOC entry and the 
branch for the directory when the directory is not active* or in 
the AST entry when the directory is activated. The storage for 
pages of directories themselves is always on the logical volume 
which contains the root. In order to prevent any user from 
monopolizing storage on the root volume * each directory will 
actually have two quota eel is* one for directory pages only* and 
the other for pages of non-directory segments. As in the current 
system* there will be a value of quota which means that there is 
no limit or the storage in this directory, but that some 
higher-level directory's limit must be checked. 
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Figure 2* Quota information 
for Each Directory 



Within this framework* it is possible (but 
make a major improvement to the current quota 
should provide users with significantly mo 
controlling their disk usage* Currently, when 
finds a quota which is nonzero, the storage 
inferior to the directory with the quota i 
quota, and no further checks are made* In the 
Storage System then continues up the hierarc 
logical volume identification matches* checkin 
level* There is no movequota operation, and 
freely at any level by any user with modify 
direct ory. 



not necessary) to 
mechanism which 
re flexibility in 
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hy, as long as the 
g auota at each 

quotas may be set 
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An example will make the use of this facility clear. 
Suppose that a project is to be given a maximum quota of i 00 
records* The system administrator sets the pro] ect directory's 
quota to 100 • Now, suppose there are 19 users on the project. 
The project administrator may then set a quota of 20 on eacn user 
home directory* Any user may use up to 20 records of storage* 
provided that the proj ect's tota I usage does not exceed 100 
records* Thus, a user could possib I e encounter several dif f erent 
record quota overflows* ei ther from his home directory, or from 
the project directory, or from higher directories* In practice, 
the quotas for the root and for >udd will be set to "infinite". 
On I y the quota cell nearest to the segment will accumulate a 
time-record product • 

Since the chain is broken when a directory with storage on a 
di f f erent logical volume is encountered, the use of dismountable 
volumes does not affect the normal quota mechanism on system 
storage. 
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ESMlns 1st lils Hierarchy. 

When system operation is interrupted abrupt ly* the Storage 
System data bases may have been left in an inconsistent state. 
We have attempted to eliminate states which are inconsistent* or 
to minimize the amount of time the system spends in these states. 
Procedures which verify that the hierarchy is correct and repair 
it if necessary will continue to be needed* though, because 
hardware and software errors can occur which violate any of our 
assumptions. 

There are four repair operations which must be worked cut: 
emergency shutdown* pack salvage* tree salvage, and tree-VTQC 
sal vage. 



EMERGENCY SHUTDOWN 

The emergency shutdown mechanism will work about the same as 
it does now. When the system crashes and ESD is invoked* an 
attempt will be made first to upcate the Volume Hap on each 
mounted volume. If this operation succeeds* an attempt will be 
made to flush out ail core pages* and then to deactivate all 
segments (and update the VTOC's). 



PACK SALVAGING 

This operation is performed whenever a volume is connected 
to the system* it should take only a few seconds. It consists 
of reading through the VTGC for the volume and examining each 
file map* and checking the Volume Hap entry for each page in the 
file map to make sure that the page is recorded as being used by 
the VTOC entry and is pointed to by only one file map address. 
In ail salvaging operations* it is not necessary to use 64-word 
I/O * and the use of virtual I/O will make the code clearer and 
more obvious. 

In **long salvage" mode each disk record which is marked 
allocated in the file map can be checked to see if it is zero* 
and if so, the record can be released from the file map and the 
VTOC entry adjusted. Free records can be checked to make sura 
that they are zero. If a record which should be zero is found 
nonzero* the data cannot be restored to its rightful owner? but 
such a fina is evidence that the system has probably lost some 
data. 



TREE SALVAGING 

This operation is like the current salvager. Starting with 
the root* the directory hierarchy is scanned and each entry is 
checked for validity. Oirectory hash tables* ACL*s* etc. are 
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rebuilt if necessary* If a branch cannot be made valid, the 
branch is deleted. If a directory cannot be made valid, the 
directory*s branch is deleted and the directory segment deleted. 



TREE-VTOC SALVAGING 

This operation is done after volume and tree salvaging. The 
directory hierarchy is walked and for each branch, the VTOC entry 
is located and the UIO match checked. Then, the VTOC for the 
system volume is scanned, and VTOC entries not visited during the 
tree walk are examined. For each such entry, there is no valid 
branch. An attempt is made to construct such a branch by 
examining the UIO pathname in the VTOC extensions this may point 
to a valid branch in a directory which has become detached from 
the root by accident, or it may point to garbage. The salvager 
fol lows the parent UIDs back unti I it finds the break in the 
hierarchy, and constructs a new branch for the segment or subtree 
in **> lost_and_f ounc". Since the VTOC extension contains the 
primary entry name for the branch, we may even be able to 
rebuild the branch in the correct place. 

In "long" moce all mounted volumes are processed. when 
speed is important, the salvager can check only the system 
volume. If this operation is fast enough, we will do it every 
time we boot the system. 



Backup 

Complete and catchup dumps can be replaced by physical dumps 
(like the current BOS SAVE) for backup of most of the system. A 
retriever can be written which will retrieve a segment from one 
of these tapes given a VTOC index. It may even be possible to 
run these dumps without shutting down, if users can accept the 5 
minutes* wait which would be required for the satisfaction of a 
segment fault encountered while a pack was being dumpeu. 

An option to allow a user of a private dismour tab le pack to 
request that his volume be dumped to tape would be desirable, 
this might be an offline utility request. 

The directory structure of the system can be backed up by a 
"skeleton dump" similar to the current dump programs , but which 
dumps only directory data, not segments. Incremental backup 
dumps can be run if the installation wishes to provide protection 
against the accidental deletion of segments. 

The Storage System will have an option to cause specified 
volumes to be written in duplicate on more than one physical 
drive. A moderately large installation can cause a duplicate 
copy of the root logical volume to be kept, and the system will 
then be protected from disk catastrophes involving the directory 
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hierarchy. A later improvement might involve allowing the system 
to automatically switch to the backup copy if the primary copy 
went bad? such a proposal can be made later if experience shows 
that the facility is useful* 



&£Xi££ Reservation 

Because there will be very little incentive for a user to 
dismount a volume* unless the installation sets a very high price 
for use of a discountable volume* and because many users may be 
using a volume's contents when it is mounted* most installations 
will wish to establish some sort of schedule for permission to 
use the disk drives which are available for user mounting* An 
automatic device-reservation system to handle the scheduled 
forced dismount of volumes on these drives and permission to 
mount new volumes will be a necessity* 

The interaction of this facility with the Access Isolation 
Hechanism must be considered carefully* 



A SOS utility must be written to initialize a volume for use 
with the new storage system* This utility must be able to label 
volumes and build VTOCs and Volume Haps* It should be able to 
zero an entire volume as well* 



£££££ £&£SserY 

Several improvements are planned for the disk QIHs so that 
when a disk drive or pack goes bad* the system will attempt to 
keep running. One consequence of this desire is that the 
supervisor will attempt to discover when it has typed out* say* 
ten disk error messages for the same disk address or address 
range* and automatically suspend use of this part of the disk 
until made to start again* (Of course* this cannot be done for 
the root volume* I Hoving packs from one spindle to another is a 
dangerous activity* especially when disk errors are occurring? 
but sometimes this will cure disk problems* and it woulo be nice 
to have the system well enough organized so that such a swap 
could be made without crashing cr shutting down* A special 
interrupt or an operator command could be used to tell the system 
to start retrying its I/O after the operator had attempted to 
correct the problem* 
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Pas raj or Coffipantis 

The fol lowing operator commands must be provided* 

reply to disk mount message 
list mounted disks 



The following commands must be provided for system 
programmer and system administrator uses 

list mounted disks 

I ist device table 

list device reservations 

force device dismount 

force online pack salvage 

force device reservation 
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Mil £&S£S 

This section describes the format of various system cata 
bases* 



Configuration Q&n&s 

The following is an example of the configuration deck for 
the new Storage Systems 

INTK 0 MULT 

PART MULT VI V2 V3 V<» V5 
PART SALV V6 



■ • 

VOL VI SRV DISK 0 0 *#0*». 

VOL Vl SRV DISK 1 0 

VOL V2 STO DISK 2 0 *i0<». 

VOL V3 STO DISK 3 0 <*Q<** 

VOL V** STO 018 0 0 202* 

VOL V5 SCR DISK h 0 2G2. 

VOL V6 SAL DISK k 202. 202. 



PRPH DISK A 23 191. (disk DIM info) 
PRPH Did A 25 181. (disk DIM info) 

The INTK card tells the system what partition to use* The 
PART cards def ine which volumes make up the partition* If fore 
than 13 volumes are in a partition* additional PART cards with 
the same partition name may be supplied* 

The VOL cards name the logical volumes* and specify their 
device type and location* In the example* "SRV" and "STO" are 
flags which describe the use to be made of the volume* and "DISK" 
and "018** are I logical ) device types which will be looked up in 
PRPH cards* The other parameters on the VOL card are pairs of 
<f irst-record* n-records> expressed in cylinders* 



Direct orv Branch 

Several data items now stored in the branch for a segment 
will be moved to the VTOC entry associated with the branch. 
There is a one-to-one correspondence between branches and VTOC 
entries* and the directory lock protects the VTOC entries as well 
as the directory branches* 

The directory information relating to the JLS5iiL3JL 
organization of the data represented in the Storage System will 
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be stored in the branch'; the information about the physical 
storage wilt reside in the VTOC for the volume containing the 
storage. 

In particular* the following items will be removec from the 
branch s 

file map 
device ID 

date/time modified 
date/time used 
current length 
records used 
AST entry pointer 

Host of these items wi I I be moved to the VTOC entry, except for 
the A STEP, which is eliminated* Instead of inspecting the 
directory to see if a segment is active the supervisor will 
search the AST for the unique ID of the segment, { A hash table 
for the AST may be implemented to make this fast.) 

In order to enable the supervisor to find the VTOC entry 
given a branch, the directory entry wilt have a new item adced f 
a VTOC pointer stored in the branch which locates the VTOC entry. 

For the branch for a directory segment, some Items will be 
added. The maximum records used for both directory and 
non-directory records, and the logical volume identifier for 
non-directory records wilt be added to the directory branch in 
order to make the activation of a directory which has a auota 
simple. (Usage, t ime-page- product » etc. will go in the 
corresponding VTOC entry. ) 

One advantage of the division of information between the 
VTOC and the branch is that directory branches need not be 
modified when a segment is activated and need not even be 
referenced when a segment is deactivated. In orcer to prevent 
any directory page from being modified at segment-activation 
time, the directory lock will also be moved to the AST entry for 
the directory (since a new rule will be that a directory cannot 
be deactivated while it is locked). This change should reduce 
the paging traffic on the system, and will reduce the chances of 
a directory page being damagec due to memory parity or cisk 
channel errors, since the page need not be written back to disk 
after use. 

One problem with this division of data is that the length 
information for a segment is kept in the VTOC, anc so the 
operation of listing a directory requires the fetching of each 
VTOC entry corresponding to an entry in the directory. As 
compared to the current storage system, the new system will have 
to co noticeably more I/O to return the same information. 
Furthermore, the real-time delays associated with functions which 
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list a directory will increase significantly. It may be 
necessary to store some of the length information in duplicate in 
both the branch and the VTOC entry in order to allow the simplest 
cases of directory listing to operate without referencing the 
VTOC. Another alternative would be to change the list command so 
that the default case does not provide any information wh ih it is 
costly to obtain* and to provide new supervisor interfaces to 
replace hcs_$star, which return only information Kept in the 
directory. 



Directory HsaQer 

Very little change will be made to the directory header. As 
mentioned above, the directory lock will be moved from the 
directory header to the AST entry in order to avoid unnecessary 
modification of directory pages. The per-directory static 
mul tilevel meters will be removed because nobody uses them. 

The quota information now in the directory header will be 
moved to the branch for the directory in the directory's parent, 
or to the VTOC entry corresponding to the branch. 

Each directory will gain one new item? the name and unique 
10 of the logical volume where segments inferior to the directory 
will be stored. This datum is also Kept in the branch for the 
directory, because it is used by the quota mechanism. This 
attribute may be changed only for empty directories. Kocify 
permission on the directory is necessary in order to change it, 
and it may not be chaged to an arbitrary value — the user 
changing the logical volume ID must be listed on the extended ACL 
of the VDS for the volume, if the volume is a private volume. 



The VTOC will be organized as a parallel set of fixed-size 
arrays in a special region of the volume not available for 
regular storage. One array will contain the VTOC information 
used during normal operation, anc the other arrays, called the 
VTOC extension, will be used to hold the special salvaging 
inf ormat ion. 



1IQ£ gptrv 

The VTCC entry for a segment will contain the following 
items* 

unique 10 

date/time segment modified 
date/time segment used 
file map 
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current length 

records used 

directory switch 

quota information <2 sets) 5 

records usea 

time-record product 

time trp last updated 
♦primary name of segment 
♦unique 10 pathname of parent 

The items marked with an asterisk will be stored in the VTCC 
extension for the convenience of the salvager. All other items 
can be reconstructed from the AST entry contents* so that 
deactivation aoes not require any reference to the directory 
branch. 



£11 J USA 

The file map in the VTOC entry will use only 18 bits per 
record address instead of 36* because the device 10 can be 
eliminated. The file maps in every VTOC entry will be maximum 
size* rather than the current situation where variable-size file 
maps are permitted. Only 256K segments will actually use more 
than 6*t words of VTOC entry* but all 192 words will be read in by 
the VTOC manager because it won't know the length of the VTOC 
entry. 



Vol yme Hajj 

The Volume Hap for a volume has one entry for each record on 
the volume. The current system's analogue to the volume map is a 
wired- down data base* the bit map portion of the FSDCT* which has 
one bit for each record. As the amount of physical storage in a 
configuration increases* this data base becomes too large to wire 
down* and so it will be allowed to reside on the volume it 
describes. 



£aj3£ Isfcls Usal 

Since a full disk address will no longer fit into 22 bits 
(18-bit address plus *»-bit device 10) as it does in the current 
system* the format of a PT W for a page which is not in core must 
change si ight I y. 

The new format of the PTW has the 18-bit volume address 
only* the index into the device table which completes the disk 
address is the same for all pages of the segment and so goes in 
the AST entry. 

A flag bit is turned ON in the PTH if the cage is all 
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zeroes. Such a page can be freed when the segment is 
deactivated* 

A flag bit is turned ON in the PTW if the page has never 
been assigned* If a reference is made to such a page, a page 
will be assigned at page out tiiie. 



151 lQi£* 

Several items must be added to the AST entry to support the 
new Storage System. These includes 

device table index 
VTOC index 
directory lock 
date/time modified 
directory switch 
logical volume 10 
non-dir quota cell 
dir quota ce I I 

Several other items must be changed. The "dnzp" switch, if still 
necessary, changes in meaning, since zero pages are nul led at 
deactivation instead of page fault time. The "did" moves to the 
device table. The *'ppml M and the "movdid" items are obsolete. 

The units for "est" and "np" should probably be 16-word 

blocks instead of 102^-word pages, in case we ever experiment 

with changing page size. The **mlsw" flag should be renamed the 
*• in_pd ir** flag for clarity. 



The device table is a new wired data base which replaces 
some of the functions of the FSOCT in the current system. It has 
one entry for each oisk drive available on the system. 

In each entry, the following information is kept? 

V0LUI0 
LVIO 

OIH type (device id) 
Volume state 

Oisk OIH data: channel, crive, etc. 

sensitivity level and category 

Volume map locaton 

read-only switch 

system-volume switch 

number of free records 

volume coming down switch 
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Vol ume Labe I 

The label for a volume in the new Storage System is checked 
when the volume is connected. It is located at a fixed address 
on the volume known to the DIN, and contains the addresses of the 
VTOC and the Volume Map. It also contains data used to verify 
that the volume is correctly mountec, such as 

VOLUID 

sensitivity level and category 
date/time initialized 
volume name 

manufacturer's serial number 
date/ time last mounted 
date/time last salvaged 
error history* bad track I ist 



JLLsis layout lac flSMrm's 

The DSU-iSi disks will be arranged to take advantage of the 
physical characteristics of the disk drive. The disk DIM for the 
19i*s will be the only module Which knows what strategies have 
been used in arranging the data on the disk (except for 80S). 

Since the first four cylinders of a 191 pack are guaranteed 
error-free, the label for the volume will be placed somewhere in 
these four cylinders. The VTOC extension will also reside in 
this area. It is tempting to put the VTOC and volume map there 
too, in order to use the most reliable cylinders on the disk, but 
probably the VTOC and volume map shouid reside at the middle 
cylinders of the disk, in order to minimize average seek time. 



SLlLmms Description Sgdien,t 

Each logical volume which can be mounted in response to a 
user reouest will have a corresponding Volume Description Segment 
in a per-system directory. The exact form of the 
volume-registration data base is currently being redesigned, but 
whatever volume-registration data base the system finally ends up 
with, the per-volume data will include 

Logical Volume ID 

List of Physical Volumes 

List of users who may set Quotas for this volume 
Name, address, account number, etc. for billing 



